Part Two: Loan Analysis & Prediction

University of Maryland - Info Challenge 2022

Team 32: Danny Rivas, Javan Reuto

Date: 03/05/22


Loan Analysis

After cleaning our data in the Part One, we are now ready to start analyzing the data.

First, we'll import our libraries.

After importing relevant libraries, we'll import the removed loans dataset and the dataset will both removed and full loans.

Defining Characteristics Of Removed Loans

Before comparing Removed Loans to Full Loans, we need to investigate the Removed Loans.

Jobs Retained by Business Type

We created a dataframe that groups Removed Loans by business type, average jobs retained and average loan amount. We want to investigate if there are any business types that have more workers that may need higher loans.

Observation:

We can see how loans differ in the average number of jobs retained and loan amount requested. On average, corporations had the highest number of jobs retained. Interestingly, organizations that were categorized as Tax-Exempt Nonprofits requested the most amount of money. We can assume that perhaps some of the Tax-Exempt Nonprofits did not need the loans as much as others and were asked to pay back the loans.

Types of Businesses & Total Loans

While Tax-Exempt Nonprofits recieved the highest loan amount average, this group of organizations didn't recieve the most loans. We will now create a stacked histogram by Business Type and Sum of Loans. We'll color code the bars with LMI indicator.

The LMI indicator shows whether or not a business is in a Low-Moderate income community. According to the SBA, the "critical goal from Congress for the 2021 round of PPP was to reach small and low- and moderate-income (LMI) businesses who have not received the needed relief a forgivable PPP loan provides." In other words, PPP loans were intended on helping relatively-low income communities.

Inference:

Over $300 million were dispersed to Sole Proprietorships. Of these loans, more than half went to business that were not in a LMI community. In addition, the majority of loans that were given to other business types were not part of a LMI community. We can assume being in a LMI community, played a role in whether or not a loan was removed.

NAICS Name

The North American Industry Classifcation System (NAICS), is used to classify businesses by their activity. Each business in our datasets had a NAICS code, in our code preparation we mapped NAICS name to the codes. Given NAICS name, we can see which business activity appeared the most in the Removed loans dataset.

With this interactive heatmap, we can zoom in on different areas of the figure. If we zoom in to the bottom left corner we can find insightful information.

Inference:

We can see that businesses that appeared the most in removed loans, were Sole Proprietorships and Independent Contractors. Amongst highest Business Classes that were removed are:

From an accounting standpoint, it's highly likely that these businesses operate on a cash basis. In other words, cash is recognized when it's recieved rather than when a service or product is sold. Typically smaller businesses use this method of accounting which also comes with less public scrutiny from investors, such as the Securities & Exchange commission. What does this mean? These businesses are vulnerable to accounting errors that may be fraudulent.

Mapping Poverty Rates & Loan Amounts by City

Using Latitude and Longitude data, we can further investigate what cities recieved the higher loans on average and gauge the economic state of each one. We'll include poverty percentage to see how healthy city economies are. Ideally, we would like to see areas with higher poverty rates with higher loan amounts.

Inference:

We can see how cities with lower poverty rates, recieved the higher loans on average. While all of the loans in this visualization are removed, we can see a trend. Cities that generally have a healthier economy, with more people employed, recieved higher loan amounts. Of course, if these areas are "thriving" then it's possible the SBA saw repayment of those loans fit.

Removed Loans Versus All Loans

Now we'll compare the removed loans to all the loans in the full dataset. First we'll group the data by LMI hubs. To create a sunburst chart to compare the two datasets.

Demographic Analysis

We can gain more insight by looking at city demographics. Below we'll make a scatter plot for each city. We'll look at how Removed and Unremoved loans compare in diversity, poverty levels, and average loan amounts.

Observation:

Above, we can see there is a relatively strong correlation between diversity levels and poverty levels. As Diversity, or number of minorities in a city increase, so does the level of poverty. In addition, the size of each marker represents the average loan size for a specific city. The higher the marker the higher the average loan size in that city. We can see that cites that had lower diversity levels recieved more loans and larger loan sizes. While this chart does not give us a reason as to why some loans were removed, we can see that loans that were removed recieved smaller loans on average.

Loans Over Time

The last metric we want to look at are loans approved over time. This may provide insight on what time of the year was it ideal to apply for a loan.

Inference:

At the beginning of the pandemic, we see a peak in number of loans approved. These loans all went to loans that were not removed by the SBA. Towards the beginning of March 2021, we can see that loans that were eventually removed from the dataset, started to appear. It's possible that loans that were removed, were loans that were not forgiven. We know now that loans given at the beginning of the pandemic were eventually forgiven. Given there was a second wave in COVID-19 cases, it's possible that more people started to apply, and the SBA had measures in place to perhaps better determine the qualifcations of a loan applicant.

Loan Prediction

%matplotlib inline

import seaborn as sns sns.set_style('dark')

plt.figure(figsize=(12,8)) sns.heatmap(ga_1_balanced.corr(), cmap='PuBu', annot=True, fmt='.1f', linewidths=.1) plt.show()